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Quantitative evaluation of the performance of 
discrete-time reservoir computers in the forecasting, 
filtering, and reconstruction of stochastic stationary 

signals 

Lyudmila Grigoryeva^, Julie Henriques^’^, and Juan-Pablo Ortega^’* 


Abstract 

This paper extends the notion of information processing capacity for non-independent input 
signals in the context of reservoir computing (RC). The presence of input autocorrelation makes 
worthwhile the treatment of forecasting and filtering problems for which we explicitly compute this 
generalized capacity as a function of the reservoir parameter values using a streamlined model. The 
reservoir model leading to these developments is used to show that, whenever that approximation is 
valid, this computational paradigm satisfies the so called separation and fading memory properties 
that are usually associated with good information processing performances. We show that sev¬ 
eral standard memory, forecasting, and filtering problems that appear in the parametric stochastic 
time series context can be readily formulated and tackled via RC which, as we show, significantly 
outperforms standard techniques in some instances. 


Key Words: Reservoir computing, echo state networks, liquid state machines, time-delay reservoir, 
memory capacity, forecasting, filtering, stationary signals, separation property, fading memory property. 


1 Introduction 

Reservoir computing is a recent but already well established neural computing paradigm [Jaeg 01, 
Jaeg 04, Maas 02, Maas 11, Croo 07, Vers 07, Luko 09] that has shown a significant potential in over¬ 
coming some of the limitations inherent to more standard Turing-type machines. This computa¬ 
tion approach, also referred to in the literature as Echo State Networks and Liquid State Ma¬ 
chines, is characterized by a simple and convenient supervised learning scheme, even though its 
performance presents as a weak side a substantial sensitivity to architecture parameters. This fea¬ 
ture explains the development in the literature of various linear and nonlinear memory capacity mea¬ 
sures [Jaeg 02, Whit 04, Gang 08, Herm 10, Damb 12] as well as the study of different signal treatment 
properties (see [Yild 12, Luko 09] and references therein) that are used to characterize and measure the 
information processing abilities of these devices in order to be able to optimize them. 

We have proposed several contributions in this direction in our previous works [Grig 15b, Grig 15a] 
in the context of RGs constructed via the sampling of the solutions of a time-delay differential equation. 
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These RCs are usually referred to as time-delay reservoirs (TDRs). More specifically, in [Grig 15b] 
we constructed a simplified model for those specific RCs that allowed us to provide a functional link 
between the RC parameters and its performance with respect to a given memory task and which can 
be used to accurately determine the optimal reservoir architecture by solving a well structured opti¬ 
mization problem. The availability of this tool simplifies enormously the implementation effort and 
sheds new light on the mechanisms that govern this information processing technique. This approach 
was extended in [Grig 15a] in order to be able to handle multidimensional input signals and real-time 
multitasking [Maas 11], that is, the simultaneous execution of several memory tasks. Additionally, we 
used this approach to estimate the memory capacity of parallel arrays of reservoir computers. This 
reservoir architecture had been introduced in [Orti 12, Grig 14a], where it was empirically shown to 
exhibit various improved robustness properties. 

The notion of capacity is defined using independent input signals, which immediately limits its 
practical functionality in several aspects. Indeed, the use of independent inputs makes empty of content 
the treatment of forecasting problems. Additionally, most input signals that need to be processed 
in specific tasks exhibit sizable autocorrelation, which automatically precludes independence. Finally, 
simple numerical experiments show that optimal reservoir architectures with respect to a given memory 
task lose that optimality as soon as the input signal ceases to be independent. 

All these facts call for a generalization of the notion of capacity suitable for correlated signals and 
for techniques to compute it. This is the main goal of this work. More specifically, we use an extension 
of the RC model introduced in [Grig 15b] in order to generalize the memory capacity formulas that were 
introduced in that paper to non-independent strictly stationary signals. Moreover, the presence of input 
autocorrelation makes worthwhile the treatment of forecasting and filtering problems for which we will 
extend the notion of capacity and that we will explicitly compute as a function of the reservoir parameter 
values. These results can be readily used in the execution of specific tasks since the expressions that we 
obtain are written in terms of various statistical features of the input and the teaching signal that can 
be simply estimated out of the training sample. 

The results in this paper are formulated for general discrete-time RCs that are not necessarily TDRs. 
We will use the generalization of the model in [Grig 15b] to this context in order to show that, for that 
approximation, RCs satisfy the so called fading memory and separation properties that are typically 
associated to good information processing performances (see [Yild 12, Luko 09] and references therein). 

We conclude the paper with a section in which we show that several memory, forecasting, and 
filtering problems that appear profusely in the context of parametric stochastic time series models can 
be readily formulated and tackled via RC which, as we show, outperforms in some instances standard 
techniques in that setup. 

Notation: column vectors are denoted by bold lower or upper case symbol like v or V. We write 
to indicate the transpose of v. Given a vector v G K", we denote its entries by Vi, with j € {1,..., n}; 
we also write v = The symbols and 0„ stand for the vectors of length n consisting 

of ones and zeros, respectively. We denote by the space of real n x m matrices with m, n G N. 

When n = m, we use the symbol M„ to refer to the space of square matrices of order n. Given a matrix 
A G M„_m, we denote its components by Aij and we write A = (Aij), with i G {1,..., n}, j G {1, ■ ■ • m}. 
We write I„ and 0„ to denote the identity matrix and the zero matrix of dimension n, respectively. We 
use Sn to indicate the subspace S„ C M„ of symmetric matrices, that is, S„ = {A G M„ | A^ = 4l}. 
Finally, the symbols E[-], var(-), and Cov(', •) denote the mathematical expectation, the variance, and 
the covariance, respectively. 
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vention 2013G-5493), the ANR “BIPHOPROC” project (ANR-14-OHRI-0002-02), and Deployment S.L. 
LG acknowledges financial support from the Faculty for the Future Program of the Schlumberger Foun¬ 
dation. 
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2 Reservoir computing with stationary input signals 

We start by spelling out in detail the main dynamical property of the input signals that we consider in 
this work. Let {z{t)}^^’^ be a one-dimensional stochastic time series, that is, for any time t G Z, the 
value z(t} G K is the realization of a univariate random variable. 

Definition 2.1 The time series is said to be strictly stationary if the joint distributions of 

the multivariate random variables {z{ti ),..., z{tk))^ and {z{ti + h),..., z{tk + h))^ are the same for 
all positive integers k and for all ti,... ,tk,h G Z. 

Definition 2.2 Given a time series {z{t)}^^^, ri,... ,rk,k G N and t,h 2 , ■ ■ ■ ,hk G h we define the 
corresponding higher order automoment {t, h 2 ,..., hk) as 


(t, /i 2 ,..., hk) ■■= E [z{t)'^^z{t + h 2 )''^ ■■■z{t + hkY *^], (2.1) 

together with the convention 

M?(t)=E[z(t)’'^)]. 

It is straightforward to show that the symbols in (2.1) satisfy the following two reduction properties: 

• If hi = 0 then: 


hz 


it, h2,...,h„...,hk) = it, h2,..., h,_i, h,+i, ...,hk). 


• If hi = hj Y 0 with i < j then: 


h. 


ri,...,rk 


it,h2 


, hk) = + ^t, h2, . . . , h„ . . . , h,_i, h, + i, ...,hk). 


The following proposition is a direct consequence of Definition 2.1. 

Proposition 2.3 Let {zit)}^^^^ be a stochastic time series whose higher order automoments exist. If 
{z(t)}jg 2 is strictly stationary then its higher order automoments are time-independent. In that case, 
we replace the notation in (2.1) by 

ih 2 , ■ ■ ■ ,hk) :=R[zitY^ zit-\-h 2 )'^'^ ■ ■ ■ zithkY'^] for any tGZ. (2.2) 

Remark 2.4 We emphasize that strict stationarity is a significant generalization of the standard no¬ 
tion of stationarity formulated in terms of the time-independence of the order one and two automo¬ 
ments [Broc 02] (mean and autocovariance functions), also referred to as second order stationarity. 
A case in which both notions coincide is when the process {-z(t)}tgz is Gaussian, that is, when for any 
ti, ... ,tk G Z and k G N, the distribution function of ( 2 (^ 1 ), • ■ ■, zitk)) is a multivariate Gaussian. ■ 


2.1 The RC setup for signal forecasting, filtering, and reconstruction 

2.1.1 The tasks 

In this work we study the performance of reservoir computing in the handling of three different signal pro¬ 
cessing tasks, namely forecasting, filtering, and reconstruction, that we now describe in detail. Let 
{ 2 :(t)}jg 2 {yY)}te 2 one-dimensional stochastic time series that will be called in what follows 

the input and teaching signals, respectively. The goal of any machine learning based signal treatment 
strategy consists of using finite size realizations := {^(1),..., ziT)} and yx ■= {2/(1), ■ • ■, viT)} of 
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and {2/(i)}tgz, respectively, in order to train a device that is capable of reproducing out-of- 
sample realizations y^/ of the teaching signal out of a corresponding realization of the input signal x^,. 
The pairs {ztjYt) and {z'j,, ,y'rp,) are referred to as training and testing samples, respectively. 


If we use the mean square error E 


iyit)-y{t)y 


as a loss function to measure the difference 


between the machine output y{t) and the actual value y{t) of the teaching signal that we seek to 
reproduce, a general result shows (see for example Section 4.1 in [Kami 94]) that this machine learning 
task amounts to a nonparametric estimation of the conditional expectation E [y{t) where JFt is the 
information set generated by the input signal up to time t, that is, Tt = a {zit), z(t — 1),...); the symbol 
a {z{t), z{t— 1),...) denotes the sigma-algebra generated by the random variables {z{t), z{t — 1),... }. 
We will distinguish three different, but possibly overlapping situations, that will be illustrated later on 
in Section 3: 


(i) Forecasting and reconstruction: in this case the teaching signal is a function, in general non¬ 

linear, of the input signal. More specihcally, we define a (/, /i)-lag forecasting/reconstruction 
task as a function H : —>• K that is used to generate a one-dimensional signal y{t) = 

H {z{t + f),..., z{t),... ,z{t — h)) that depends on the value of the input signal / lags into the 
future (forecasting part) and h lags into the past (reconstruction part). Examples of this task are 
presented in Subsections 3.1 and 3.2. 

(ii) Filtering: it is a generalization of the previous case in which the input and teaching signal exhibit 

statistical dependence but do not necessarily have a deterministic functional dependence. An 
example of this task is presented in Subsection 3.3. 


2.1.2 The reservoir computing setup and its capacity 

The reservoir computing construction that we consider in this work is based on the choice of a nonau- 
tonomous discrete-time dynamical system of the form: 

x{t) = F(x{t — l),l{t),0), with t e Z, x(t), I(t) G and 0 G (2-3) 

The map F : x x is called the reservoir map. The vector x(t) is referred to as the 

neuron layer at time t and each of its components Xi{t) are its neuron values. The vector 9 G 
contains the set of parameters that the reservoir map depends on. The vector I(t) G R^ is the input 
forcing of the reservoir that is constructed out of the input signal {z{t)}^^^, z{t) G K, by using an 
input mask c G R^ and by setting I(t) := cz{t). 

Example 2.5 Time-delay reservoirs (TDRs): This RC setup [Roda 11, Guti 12] is based on the 
sampling of the solutions of a time-delay differential equation (TDDE) of the form 

x{t) = -x{t) + f{x{t - T),I{t),e), (2.4) 

with time-delay r and where f : R x R x —>■ K is called the kernel map. A reservoir map 

F : R^ X R^ X R^ like in (2.3) can be constructed in this case by considering the Euler time- 

discretization of (2.4) with integration step d := t/N, with N the number of neurons of the desired RC 
setup, namely, 

^^—— = -xit) +f{x{t-T),Iit),e). (2.5) 

d 

Next, we use an input mask c G R^ to multiplex the input signal over the delay period by setting 
I(t) := cz{t) G R^ . We then organize it, as well as the solutions of (2.5), in neuron layers x(t) 
parametrized by a discretized time t G Z which yields 

Xi{t) := e~^Xi-i{t) + {l-e~^)f{xi{t-l),Ii{t),6), with xo(t) := XNit-1), and ^ := log(l-|-(i), (2.6) 
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where Xi{t) and Ii{t) stand for the jth-components of the vectors x(t) and I(t), respectively. The value d 
is referred to as the separation between neurons. The reservoir map (2.3) is obtained by using (2.6) 
in order to write down the neuron values of the layer for time t in terms of those for time t — 1 and the 
current input signal value. More specifically: 

xi{t) = e~^XN{t - 1) + {I - e~^)f{xi{t - l),Ii{t),e), 

X 2 {t) = e-‘^^XN{t- 1 ) + (1 - e"«) {e-^f{xi{t- l),Ii{t),e) + fix 2 {t- l),l 2 {t),e)} , 

: (2.7) 

XNit ) = e ~^^ XN{t - 1 ) + {1 - e ~^^ f { xN-jit - l ), lN - 3 { t ), e ), 

which corresponds to a description of the form 


x(t)=F(x(t-l),I(t),0), (2.8) 

that uniquely determines the reservoir map F : X X . Physical implementations 

of this scheme carried out with dedicated hardware are already available and have shown excellent 
performances in the processing of empirical data: spoken digit recognition [Jaeg 07, Appe 11, Larg 12, 
Paqu 12, Brun 13], the NARMA model identification task [Atiy 00, Roda 11], continuation of chaotic 
time series, and volatility forecasting [Grig 14a]. ■ 


As we explained in the previous subsection, a task is assigned to the RC by fixing a teaching signal 
{y(<)}jg 2 by minimizing the mean square error committed at the time of reproducing it with an 
affine combination of the reservoir output x(t) of the form W^x(t) + a, with a G K and W G K^. The 
optimal pair (Wout, Oout) is referred to as the readout layer and is obtained by solving the ridge (or 
Tikhonov) regularized regression problem 


(Wout, Oout) : = arg min 

WGR*',aGi 


(E[(WTx(t) + a-y(t))' 


+ AIIWII 


A G 


(2.9) 


whose solution is given by 


Wout =(r(0) + AlAr)-'Cov(y(t), x(t)), 
flout =E[j/(t)] - 


In this expression, p,,, := E [x(t)] is the expectation of the reservoir output and 


r(0) := E 


(x(t) 


fj-x) (x(t) - 


( 2 . 10 ) 

( 2 . 11 ) 


is the lag-zero auto covariance exhibited by the reservoir output. We show in the next subsection that 
if the input signal is strictly stationary then the two moments and r(0) are time-independent. The 
mean square error committed by the reservoir when using the optimal readout is: 


E 


(Wjut • x(t) -f flout - y{t)f = Wj,^tr(0)Wout + var {y{t)) - 2Wj,^tCov(y(t), x(t)) 


= var (y(t)) - Cov(?/(t),x(t)) ' (r(0)-f AIat) Rr(0)-b 2AlAr)(r(0)-f AIat) ^Cov(y(t), x(t)). (2.12) 


The reservoir capacity C{6, c. A) is defined as one minus the mean square error that we just computed, 
normalized with the variance of the teaching signal 


c{e,c,x) 


Cov(t/(t),x(t))T(r(0) + Aljv)-i(r(0) + 2AIv)(r(0) + AIv)-iCov(z/(t),x(t)) 

var(?/(t)) 


(2.13) 
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We emphasize that this capacity is a natural generalization to the context of non-independent stationary 
input signals of the notion introduced in [Jaeg 02, Whit 04, Gang 08, Herm 10, Damb 12]. Additionally, 
we point out that C{6, c, A) depends on, apart from the reservoir parameters 6, the input mask c, and 
the regularization constant A, also on the task determined by the teaching signal {y{t)}^^j^. It is worth 
noting that since the normalized error coming from (2.12) is bounded between zero and one, it is clear 
that 0 < C{6, c. A) < 1. 

2.2 The reservoir model 

The capacities (2.13) for a reservoir of the form (2.3) are in general very difficult to compute analytically. 
The standard approach to determine them consists hence in fixing a triple (0,c, A) and in estimating 
the corresponding reservoir capacity C{9,c,X) via Monte Carlo simulations. This strategy makes the 
finding of the optimal parameters for a given task computationally very expensive. 

In [Grig 15b] we introduced an approximate model for the time-delay reservoirs (TDRs) presented 
in Example 2.5 that made possible an analytic estimation of their capacities under a very strong inde¬ 
pendence hypothesis in the input signal; this condition was already present in the original definitions of 
this notion [Jaeg 02, Whit 04, Gang 08, Herm 10, Damb 12]. These developments provided a functional 
link between the TDR parameters and its performance with respect to a given reconstruction task which 
was used to accurately determine the optimal reservoir architecture by solving a well posed optimization 
problem. 

In this section we extend that construction to more general RCs driven by strictly stationary input 
signals (see Definition 2.1). The approximate model of the RC in (2.3) is obtained, as in [Grig 15b], by 
partially linearizing F with respect to the self-delay at a stable fixed point Xq G of the autonomous 
system associated to (2.3). This condition means that the point Xq € is chosen so that F{xq, On, 9) = 
Xq, 9 G and for which the spectral radius p (A(xo, 9)) < 1, with A(xo,9) := D^F(xq,0n,9), in 
order to ensure stability. In [Grig I5b] we provided both theoretical and empirical evidence that suggests 
that optimal reservoir performance can be achieved when working in a statistically stationary regime 
around a stable equilibrium. The stability of the point Xq implies, in passing, that the reservoir states 
x(t) remain close to xq, and hence justifies approximating the reservoir (2.3) by the series expansion: 

„ i factors 

x{t) = F(xo, Oat, 9) + D^F{xo, On, 9){x{t - 1) - Xq) + ^ -D^"^F{xo, Oat, 9) I(t) G • • • G I(t) 

= Xo -k A(xo, 9)(x{t - 1) - Xq) -k e{t), (2.14) 

where e(t) G is a vector whose entries are polynomial functions of the input signal z(t) at t. More 
specifically 

i? ^ i? ^ 

e{t) = ^ -D^*V(xo, Oat, 9)I{t) G • • • G l{t) = ^ -D^j‘^F{xo, On, 9) {cz{t)) G • • • G (cz(t)) 

= Oat, 0) (c G • • • 0 c) = {z{t), c),...,q^ {z{t), c))^ , (2.15) 

where (•, c) is a polynomial of degree R G N whose monomial of order i has as coefficient the value 

-D["'’Fj{xo,ON , 0) (c 0 • • • 0 c) with Fj is the j-th component of the map F := {Fi,..., Fn) in (2.3). 
v. '• -^^ 

i factors 

In what follows, we will refer to the recursion 

x{t) = Xo -k A(xQ,9){x{t - 1) - Xq) -k e{t) 


( 2 . 16 ) 
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as the approximate reservoir model or just the reservoir model. We now notice that the strict 
stationarity of {z(t)}t(£z implies that of {e{t)}t^z- In particular, 

Me := E [e(t)] = {{q]i (x, c)) (/r J , • • • , (x, c)) , (2.17) 

where the symbol (x, c)) (^j,) stands for the evaluation of the polynomial g^ (x, c) according to the 
following convention: any monomial of the form Urx'' is replaced by ^ similar convention can be 

used to write down the autocovariance Ts{h), h G Z of {e(<)}(gz. Indeed, for any i,j e {1,... ,N}: 

= E [e\t)e\t + h)] - pIpI = (g)^ (x, c) • g^ (y, c)) {p'^'(h)) - (2.18) 

where the symbol • denotes polynomial multiplication and the first summand stands for the evaluation 
of the bivariate polynomial g)^ (x, c) • (y, c) according to the following convention: any monomial of 

the form ar,sx'^y’^ is replaced by ar,syl’^(h), with p^y^ the second order automoment of {z{t)}tez- 

The following proposition, whose proof can be found in the appendix, shows that the strict station¬ 
arity of the input signal implies the second order stationarity of the output {x(t)}tgz of the approximate 
reservoir (2.16). 


Proposition 2.6 Let {x(t)}(g 2 be the output of the reservoir model (2.16). Suppose that the spectral 
radius p{A{:kq,6)) < 1 and that the input signal {z{t)}t^z is strictly stationary and has finite automo¬ 
ments up to order 2R (R is the order of the expansion that defines the reservoir model (2.16)/ Under 
those hypotheses, the reservoir output {x(t)}(g 2 is second order stationary (see Remark 2.4) and the 
corresponding time-independent mean autocovariances E are given by: 

Mx := E [x(t)] = xo -b (In - ^(xq, e ))~^ (2.19) 


T(h) :=E 


(x(t) 


Mx) (x(t + h) 



A^r,ik-j-h){A’^y , hGZ, 

j,k=0 


( 2 . 20 ) 


with andTg, provided by (2.17) and (2.18), respectively. Under these hypotheses, the recursion (2.16) 
that determines the reservoir model can be rewritten as 


(x{t) - pj = A(xo, e)(x(t - 1) - p^) -b {e{t) - p^) , (2.21) 

and 

OO 

x(t) = p^ + Y^ A(xo, ey {e{t - j) - P^j , (2.22) 

3=0 

is the unique stationary solution of (2.21). 


2.3 Reservoir capacity estimations for signal forecasting, reconstruction, 
and filtering 

We now use the reservoir model introduced in the previous section in order to provide estimates for 
the different information processing tasks that we described in Section 2.1.1. All along this section, we 
assume that the input signal {z(t)}t£z is strictly stationary and has finite automoments up to order 2R 
so that we can use the results contained in Proposition 2.6. 
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2.3.1 Linear and quadratic forecasting and reconstruction 

The linear case. Consider the linear forecasting/reconstruction task H : —)• K determined 

by the assignment 

= {z(t + z{t),..., z{t - h)) G I— !■ 

with L G We construct the teaching signal by setting y{t) := L^z-f’^{t). Notice first that 

Py := E [ 2 /(t)] = PzLJ if+h+i- We now estimate the memory capacity Ch[0,c,\) associated to the 
task H and exhibited by the reservoir model (2.21). Notice that the evaluation of the expression (2.13) 
requires the computation of the lag-zero autocovariance r(0) of the reservoir output in terms of the 
reservoir parameters, as well as var(y(t)) and Cov(?/(t),x(t)). The expression for r(0) is explicitly 
provided by (2.20); regarding var(?/(t)) and Cov(?/(t),x(t)) we have: 

• var(y(t)) = E [y{tf] - [z-^’^(t)z/’'‘(<)^] L - = L^(r^ - plif+h+iij+h+i)^^ 

e Sf+h+i dehned by 

^Ij = hl’Hi- j), with i,j G + h + l}, (2.23) 

and the second order automoment of the input signal. 

• Cov{y{t),x{t)): consider the representation in (2.22) of the unique stationary solution of the 
reservoir model, that is, 

OO 

(x(t) - = A(xo, eyp{t - j), (2.24) 

1=0 

with pit) := e{t) — Consequently 


Cov{y{t),x{t)) 


/ + ^ + l 

Cov {Ljzit + f+ 1-j),xit)) 

1=1 

/ + /l + l C 50 

LjA{xo,e)'^E [{zit + f + l-j)- Pz) {£{t - k) 

j—1 fc—0 
f+h+l OO 

j—1 k—O 


(y.c)) {py if + k + i 
\{x»qR iy,c)) (pI’- if + k + i 


-Me)] 

j)) 

j)) 


(2.25) 


where this expression has been written using the same convention as in (2.18). 


The quadratic case. Consider the quadratic forecasting/reconstruction task H : i K de¬ 

fined by the assignment z-^’^(f) —i z^^it)^Qz-f^it), with Q G Sj+h+i- We then define the teaching 
signal 

/ + ^ + l 

yit) := H iz^-^it)) = Y Qi,jzit +f+ l-i)zit +f+ 1-j). (2.26) 

j,l=i 


This implies that 


f+h+l 

Pv ■= E [yit)] = Y - j)- 


bl=i 


(2.27) 
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At the same time 


f+h+l 


= X! Q^, 3 Qk,lz{t +f+ l-i)z{t +f+ 1-j)z{t +f+ l-k)z{t^ f+ 1-1), 


/+^+l 


and hence 

^[yitf]= X! QrdQk,iyY'^'^{i-j+-k,i-i). 

Consequently, by (2.27) and (2.29), 

var(j/(t))= QijQk,ipl'^’^'^{i-j,i-k,i-l)- I Y - J) 

i,j,k,l=l \ z,i=l 


(2.28) 


(2.29) 


In order to compute Cov (y(t),x(t)), we use again the representation (2.24) and hence 


Cov (y(<), x(t)) = Cov (y(t), x(t) - p,J = Y 0)''Cov {y{t),p{t - k)) 

k^O 

oo 

= Y ^(^ 0 ’ {yii)s(.t - k)] - PyPe] 


oo Z+Zi+l oo 

= Y Yl A{xo,9)'^Q,^jE[z{t + f + l-i)z{t + f + l-j)£{t-k)]-pyYM^o,d)''fJ-e 

k—0 iJ — 1 k—0 

(x • y • {z, c)) (/ibb- (i- j^i-k- f -1))' 


oo f+h+1 

= Y Y Qhi^i'^o.O) 

k—0 i,j — l 


,(x • y • {z, c)) (yi’i’- {i - j,i - k - f - 1)) ^ 


-Aiy^A(xo,0)'^>^. 

(2.30) 


fc=0 


2.3.2 Filtering of stochastic costationary signals 

This case is a generalization of the previous one in which the input and teaching signal exhibit statistical 
dependence, even though they do not necessarily have a deterministic functional link. This statistical 
relation is used by the RC in order to construct a nonparametric estimation of the conditional expec¬ 
tation E [y(f) |Jq], where Tt is the information set generated by the input signal up to time t, that is, 
IFt = a {z{t), z{t — 1),...). As we already explained in Subsection 2.1.1, this conditional expectation 
minimizes the mean square error committed by the RC at the time of reproducing the teaching signal. 
We start by introducing the following definition. 

Definition 2.7 Let {z(t)}tei, and {y(t)}tgz be two one-dimensional stochastic time series. Given r € N 
and h G Z we define the higher order comoment as 

h) := E [y{t)z{t + hf] . (2.31) 

If the higher-order comoments up to order r exist and are time-independent, we say that {y(t)}tgz and 
{z(t)}tgz are rth-order costationary and we note 

■= E [y{t)z{t + h)^], for any t G Z. 


(2.32) 
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Suppose now that {z(t)}t ^2 is the input of the RC and {y(<)}tgz is a teaching signal defining a 
specific filtering task. As we did all along this section, we assume that the input signal is strictly 
stationary and has finite automoments up to order 2i?; additionally we suppose that { 2 (t)}tgz and 
{yit)}tei, are costationary of order R. 

With these assumptions, we can explicitly spell out the performance of the RC in the filtering task 
by using (2.13) and by noting, first, that var(y(t)) can be estimated out of the teaching signal and 
second, that by (2.24): 


Cov {y{t), x(t)) = Cov (y(t), x(t) - /x^) = ^ A(xo, eyCov {y{t), e{t - j) - p.^) 

1=0 


OO OO 

= X! ^(^0’ oy Co-w {y{t), £{t - j)) = ^ A(xo, oy 
1=0 1=0 


/{x»q\ (u,c)) {py^A-j)) 
V(a:*(7^ {u,c)) {pyj-j)) 


- hzPe : 

(2.33) 


where the expression (x • {u, c)) (yPy^^{—j)') stands for the evaluation of the polynomial x • q}^ {u, c) 

on the variables x and u, according to the following convention: each monomial of the form axvf is 
replaced by apl „{-j). 

We emphasize that given the input and teaching signals {z{t)}t^i, and {y{t)}tez, respectively, the 
higher order comoments can be estimated out of the training sample and inserted in (2.33). When this 
expression, together with the estimate of var {y{t)) is substituted in (2.13), we obtain an estimate of the 
RC capacity for any value of its parameters 6 and the input mask c. 


2.4 The fading memory and the separation properties 

The fading memory and the separation properties that we describe later on in this section have been 
identified in the context of reservoir computing to be in relation with good information processing 
performances (see [Yild 12, Luko 09] and references therein). The goal of the following paragraphs is 
showing that the reservoir model (2.16) for the discrete-time reservoir computer (2.3) exhibits these 
features under reasonable assumptions on the reservoir map. 

Definition 2.8 Consider the discrete-time reservoir map (2.3). 

(i) We say that the reservoir map (2.3) satisfies the uniform fading memory property (UFMP) 

whenever for any e > 0 there exist Jg > 0 and lig S N such that if for any two input signals 
{z(t)}jgg, {z'{t)}^^^ the relation |z(s) — z'(s)| < Sg holds for all s € [t — he,f\, t G Z, then the 
corresponding outputs x(t), x'(f) are such that ||x(t) — x'(t)|| < e. The values 6g > 0 and hg gN 
corresponding to a given e > 0 are the same for any t G Z. 

(ii) We say that (2.3) satisfies the separation property (SP) if for two input signals {z(t)}^^’^, 

{z'(t)}jg 2 that differ only at some time point s G 'L, that is, z(s) y z'[s), the corresponding 
outputs satisfy that x(t) y x'(t) for any t > s. 

The proof of the following two results can be found in the appendices at the end of the paper. 

Theorem 2.9 Consider the reservoir model (2.16) driven hy the real valued and non-necessarily sta¬ 
tionary input signal {z{t)}^^j^. 

(i) Let c G be an input mask and I(<) := cz{t) the corresponding input forcing. Let 

^ i factors 

Ff (I(t), xo, 6 ) := Ow, 0) I(t) 0 • - 0 I(t) 
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be the Rth-order Taylor series expansion of the reservoir map F at the point (xo,Ojv,0) with 
respect to the input forcing I(<). Assume that one of the following conditions holds: 

(a) The map F/^(-,xo,0) : —)■ is injective. 

(b) The input signal is bounded, that is, there exists k G such that |z(t)| < k, for all t G Z, 
and the map Fp is injective in the set B = {iG ||I|| < ||c||A:}. 

If additionally, the linear map A(xo,0) := Zlx-F(xo, Ojv, 0) : —>■ has no zero eigenvalues, 

then the reservoir model satisfies the separation property. 

(ii) Suppose that the input signal {z(i)}tgz is strictly stationary with finite automoments up to order 
2R and that it is bounded, that is, there exists k G K’*' such that |^(t)| < k for all t G Z. If 
additionally the linear map A(xo,0) is such that ||A(xo,0)|| < 1, with || • || some matrix norm 
induced from , then the reservoir model (2.16) satisfies the uniform fading memory property. 

Remark 2.10 This result can be easily extended to multidimensional input signals, that is, {z{t)}^^^, 
z{t) G K". In that case (see [Grig 15a] for the details) the RC is constructed by using an input mask 
c G Mjv.n that takes care not only of the temporal, but also of the dimensional multiplexing by setting 
I(t) := cz{t). The only additional hypothesis needed in that situation is that the rank of c has to equal 
n in order to conclude part (ii) of the theorem. 

The following result contains a statement analogous to that of Theorem 2.9 in the particular case 
of the time-delay reservoirs introduced in Example 2.5. In that situation, some hypotheses are either 
automatically satisfied or can be formulated in a simplified manner. 


Corollary 2.11 Consider a time-delay reservoir of the type introduced in Example 2.5 with nonlinear 
kernel f : KxMxK^ —> K, parameters 9 G and a non-necessarily stationary input signal {z{t)}^^^, 
z{t) G M. 


(i) Let c G be an input mask and l{t) := cz(t) the corresponding input forcing. Let fp{I{t),xo, 9) := 

EiLi f){xo, 0,9)1 {ty be the Rth-order Taylor series expansion of the kernel map f at the 

point (xq, 0, 9) with respect to the input forcing I(t). Assume that one of the following conditions 
holds: 


(a) The map fP{-,xo,9) : K —>■ K is injective; 

(b) The input signal is bounded, that is, there exists k G such that |z(t)| < k, for all t G Z, 

and the map fy{-,Xo,9) is injective in the set S = {/ G M | |/| < IlCjlfc}. 

Then the corresponding TDR model satisfies the (SP). 

(ii) Suppose that the input signal {z(t)}tez is strictly stationary with finite automoments up to order 

2R and that it is bounded, that is, there exists k G K’*' such that \z(f)\ < k, for all t G Z. If the 

partial derivative dxf{xQ,0,9) of the nonlinear kernel f evaluated at the point {xo,0,9) satisfies 
the condition \dxf{xo,0,9)\ < 1, then the TDR model satisfies the (UFMP). 


3 Examples 

This section describes examples in connection with the three different tasks listed in Subsection 2.1.1, 
namely, forecasting, reconstruction, and filtering. The first two examples take place in the context of 
two parametric stochastic time series families (ARMA and GARCH). As we will see, forecasting in 
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those two setups comes down, in the terminology introduced in Section 2.3, to solving a linear and 
a quadratic forecasting task for which the performance of RC has been already empirically evaluated 
in [Grig 14a]. The last example falls in the category of pure filtering. In that case we will show how 
a specihc type of RC is capable of outperforming two standard filtering techniques (Kalman filtering 
and the hierarchical-likelihood approach). Additionally, we will evaluate the accuracy of the capacity 
formulas introduced in Section 2.3 and based on the reservoir model (2.16) at the time of estimating 
the performance of the actual RC. 


3.1 Multistep forecast of ARM A temporal aggregates 

Consider the causal and invertible ARMA(p,q) specification (see [Box 76, Broc 06], and references 
therein for details) determined by the equivalent relations 

OO OO 

^{L)z{t) = &{L)({t), z{t) = - i), C{t)=Y - j)> {CWI ^ IID(0, a^), (3.1) 

2—0 j—0 

where L is the backward shift operator defined by L{z{t)) := z{t — 1), ^>( 2 :) := 1 — (j)iz — ■ ■ ■ — (j)pzP, 
&iz) := l+eiz+- ■ ■+0qz'i, ’l'(z) #(z)-i0(z), n(z) := @{z)-^^{z), and the symbols $(L) and 0(L) 

stand for the operators 4*(A) := 1 — 4>iL — • • • — (j)pLP and 0(A) := 1 -b 9iL -f • • • -b OqL'^, respectively. 
Consider now an aggregation vector w € It can be shown [Grig 14b, Grig 15c] that given a 
realization zt of the process {z(t)}tg up to time T, the best forecast z'^{T -b /) (in the sense that it 
minimizes the mean square forecasting error) of the temporal aggregate z^ (T +f) = J2i=i ^ +*) 

based in the information set generated by z^, is given by 

/ T+ 2 —1+r / T+ 2 —1+r OO 

^{T+f) = Y Y = 51 51 (^- 2 ) 

2=1 j—1 i—1 j—1 k—0 


with r := maxjp, g}. The mean square forecasting error MSFE (^z'^(T -b f )j associated to this forecast 
is [Grig 14b]: 


MSFE 


(P(T-b/)) = E (^z^{T + f)- z^{T + 


f i-l /-I / i-l 

wf-^+iWf-j+i Y — i 
2=1 Z=0 2=1 l—O 


2+Z 


This task can be solved via RC by using as input signal either the innovations C{t) or the time 
series values z{t) up to time T and using as teaching signal the values z'^(t), also up to time T. In 
the terminology introduced in Section 2.3, in both cases this forecasting problem amounts to a linear 
task. For example if z{t) is used as teaching signal, the linear task map H : K-f — )■ M is given by 
zf{t) = {z{t -b /),..., z(t + 1)) I—^ v^^zf{t) 


The RC performance can be evaluated in this case by using the formula (2.13) and the elements 
introduced in Section 2.3.1, as long as the necessary automoments exist. That is the case whenever the 
innovations {^(t)}tgz are Gaussian because in that situation the input signal {z(t)}tgz is a Gaussian 
process (see [Broc 06]) and hence all the automoments are finite and can be readily computed [Holm 88, 
Tria 03]. 
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3.2 Multistep forecast of temporally aggregated GARCH volatilities 

In this section we study the forecasting of the volatility associated to a flow-type aggregated sample 
generated by a GARCH(1,1) process [Engl 82, Boll 86] and we show that it can be encoded as a quadratic 
forecasting task of the type described in Section 2.3.1. More specifically, consider the process {z{t)}t^z 
determined by 

z(t) = a{t)({t) with ^(t) ^ IID(0, (T^) and a‘^{t) = + aiz{t — + Pa(t — , (3.3) 

where a(t) stands for the positive square root of cr^(t). The constants 00,01 and /? are positive real 
numbers subjected to the constraints oq >0, ai,j3>0 and ai + P < \ that ensure the existence of a 
unique stationary solution and the positivity of the conditional variance process {a{t)^}t^ 2 , that is by 
construction predictable. GARCH processes are profusely used in the hnancial econometrics literature 
in the modeling of the conditional volatility associated to the time evolution of asset returns. 

A problem of much importance in financial risk and portfolio management applications is the fore¬ 
casting of the variance var^ {zT+f[f]) of the flow aggregate ZT+f[f] ■= z{T -I-1) -I- • • • -|- z{T -|- /) based 
on the information set Tt generated by a realization Zt of the process {z{t)}t^z up to time T and the 
conditional volatility cr(T). It can be shown [Henr 14] that in the GARCH(1,1) context this forecast is 
given by 


varr (^r+/[/]) = E [zT+flf? \ tFr] 


foio 


1 ~ (oi + P) 


+ aiT + lf- 


oq 


\ 1 - (oi + pY 

p)j l-(ai-k/3)’ 


(3.4) 


where a{T -I- 1) = (oq -I- aiz{T)'^ -I- /3cr(T)^) ^. The estimation of this forecast can be solved via RC by 
using {z(t)}t^z as input signal and corresponding teaching signal 


y{t) := (z(t + 1) + 


+ zit + f)f — ^ 

fci + /c2 H- \~^f —2 


ki,k2 


z{t + lf^z{t + 2f^---z{t + ff 


... 

where the summation in the second equality is taken over all sequences of nonnegative integer indices 
ki,... ,kf, such that the sum ki + ■ ■ ■ + kj = 2. This can be encoded as a quadratic forecasting task 
with a task matrix Q G Sf whose coefficients are given by the expression (3.5). The evaluation of the 
RC performance in this task using the formula (2.13) and Section 2.3.1 is only possible when the higher 
order automoments of the process (3.3) exist. This feature is by no means guaranteed in the GARCH 
context and imposes various semialgebric constraints on the coefficients ao, ai, p. See [Li 01, Ling 02] 
for a characterization of the higher order moments of the GARCH family. 

Time-delay reservoir (TDR) computers have shown in [Grig 14a] excellent empirical performances 
at the time of carrying out this task when using as data generating process the VEC-GARCH family 
introduced in [Boll 88] , which is a multivariate generalization of the GARCH family. In that work, TDRs 
were also shown to outperform standard parametric multivariate volatility models in the forecasting of 
actual market realized volatility. 


3.3 Filtering of autoregressive stochastic volatilities 

In this example we consider the autoregressive stochastic volatility (ARSV) model [Tayl 86] determined 
by the linear state-space prescription 


z{t) = r + a{t)({t), {C(t)}tez ~ nD(0,1) 

b{t) = X +ab{t - 1) + w{t), {w(t)}tez ~ HD(0, cr^) 


(3.6) 
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where b{t) := \og{a{t)^), A is a real parameter, and a G (—1,1). We will additionally assume that the 
innovations {C(t)}tGZ and {w{t)}t^z are independent. It is easy to prove that the unique stationary 
process {z{t)}t^z induced by (3.6) and available in the presence of the constraint a G (—1,1) is a 
white noise (the returns have no autocorrelation) with finite moments of arbitrary order. Moreover, the 
unconditional variance cr^ of the stationary process {&(t)} is given by 


and if the innovations {C(0} and {w(t)} are Gaussian, then the unconditional variance and kurtosis of 
the process {y{t)} are given by 


var(z(t)) = E[cr(t)^] = exp 


A 

1 — a 


2 ^ 


2 

b I 


and kurtosis {z{t)) = 3 exp (cr^) . 


(3.7) 


Moreover, it can be shown [Tayl 86] that whenever is small and/or a is close to one then the 
autocorrelation y{h) of the squared returns at lag h can be approximated by 


7(/i) 


exp(tTg) -1 /1 

3exp(cr^) - 1 


The main difference of the ARSV model (3.6) with respect to the GARCH family is that, in this 
case, the volatility process {cr(t)}tgz is a non-observable, non-predictable stochastic latent variable that 
cannot be written as a function of previous realizations of the observable variable z{t) and the volatilities 
(j{t). Many procedures have been developed over the years to go around this difficulty whose solution is 
needed, in particular, to estimate the model parameters. In this section we will focus in only two them 
that are profusely used in the literature. First, the specific form of the prescription (3.6) corresponds 
to a state-space model in which the observation equation is the one that yields {z(t)}t^z and the state 
equation rules the time evolution of b{t) := log((T(t)^). This observation makes appropriate the use 
of the Kalman filter [Harv 94] to obtain estimations of the conditional log-variances b{t) based on the 
observed values z{t). The other method that we will use as a benchmark is the hierarchical-likelihood 
method [Lee 96, Lee 06, Cast 08, Lim 11] (abbreviated in what follows as h-likelihood) that incorporates 
the unobserved volatilities as an unknown variable at the time of writing a likelihood that is optimized 
and that takes into account the observed time series values z{t). 

In the RC context, the problem of estimating the unobserved volatility a{t) out of the observed 
values of z{t) up to time t, can be easily encoded as a filtering problem of the type characterized in 
Section 2.1.1 and for which the RC performance was studied in Section 2.3.2 by using the reservoir 
model. Indeed, it suffices to take z{t) as input signal and as teaching signal y{t) the functional form of 
the volatility that we are interested in. Both the Kalman filter and the h-likelihood methods are designed 
to produced optimal (linear in the case of Kalman) estimations of the the log-variance log(tT(t)^), which 
is a limitation to which RC is not exposed. 

In the paragraphs that follow we carry out an empirical exercise in this context in order to compare 
the performance of the RC with that of Kalman and h-likelihood, and also to evaluate the accuracy of 
the capacity formulas introduced in Section 2.3 and based on the reservoir model (2.16) at the time of 
estimating the performance of the actual RC. 

We proceed by using a time-delay reservoir of the type described in the Example 2.5 constructed 
with the so-called Ikeda kernel map given by the expression: 


f{x,I,e) = rysin^ {x + yl + (j)), 6 := (7,7, </) G 


(3.8) 


The architecture of the reservoir chosen contains 40 neurons and an input mask c G that was 
randomly constructed with values uniformly distributed in the interval [—1,1]. 
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We present to this TDR the filtering tasks consisting on estimating four different functions of the 
volatility a{t) generated by an ARSV model with parameters r = 3.9 • 10“"^, = 0.675, A = —0.821, 

and a = 0.9. The four different teaching signals used are yi{t) := (j{t), y 2 {t) ■= cr{t)^, ysit) := log(cr(t)), 
and y 4 {t) := log(cr(t)^). Given a fixed input mask c, the reservoir parameters 0 are optimized with 
respect to each of these four filtering tasks. In this case, the optimal parameters were the same for 
the four cases, namely, 7 = 2.866, (p = 1.124, rj = 0.461, and d = 0.839; we recall that d := t/N is 
the separation between neurons. Table 1 presents the performances (in terms of the normalized mean 
square error (NMSE)) exhibited by the TDR in the execution of the four filtering tasks and compares 
them with those attained using the Kalman filter and the h-likelihood approaches. The figures in the 
table show that these two benchmarks outperform the RC at the time of filtering the functions of the 
volatility (logarithm) that they have been designed for but when it comes to providing the values of the 
actual instantaneous volatility or variance, it is the RC that performs the best. 


Stochastic volatility filtering performance (NMSE) 




Teaching signal proposed/Task solved 




Instantaneous 

Instantaneous 

log of Instantaneous log of Instantaneous 



volatility 

variance 

volatility 

variance 

Filtering Method 

h-likelihood 

0.476 

0.730 

0.411 

0.411 

Kalman 

0.536 

0.812 

0.429 

0.429 

Reservoir Method 

Reservoir computer (TDR) 

0.437 

0.594 

0.655 

0.655 

Reservoir model 

0.453 

0.661 

0.652 

0.601 


Table 1: Performances (in terms of the normalized mean square error (NMSE)) exhibited by the TDR in the execu¬ 
tion of four volatility filtering tasks compared with those attained using the Kalman filter and the h-likelihood 
approaches. 


We finally evaluate in the context of this filtering task the accuracy of the capacity formulas intro¬ 
duced in Section 2.3. Figure 1 depicts the error surfaces associated to the filtering of the instantaneous 
volatility a{t) of the same ARSV data generating process that we considered in the construction of 
Table 1. The left panel has been computed using Monte Carlo simulations in order to empirically eval¬ 
uate the filtering error of the Ikeda TDR as a function of the parameter rj in (3.8) and of the distance 
between neurons. The right panel was obtained by evaluating the formula (2.13) based on the reservoir 
model (2.16) with the help of the elements introduced in Section 2.3.2 and a nonlinearity of order R = 8. 
The two surfaces clearly resemble each other and, more importantly, exhibit their minima at virtually 
the same parameter values. This proves that, as it was already shown in [Grig 15b, Grig 15a] for inde¬ 
pendent signals, that the theoretical model can be efficiently used to determine the optimal reservoir 
architecture in the presence of strictly stationary inputs. 


4 Appendices 

4.1 Proof of Proposition 2.6 

We start by emphasizing that the potential lack of independence in the input signal {z{t)}t^z implies that 
{e(t)}tgz is in general not a white noise. Consequently, unlike the situation encountered in [Grig 15b, 
Grig 15a], the recursion (2.16) does not determine a standard VAR(l) model. Nevertheless, since by 
hypothesis p{A{xo,9)) < 1, the proof of the existence of a unique stationary solution for a VAR(l) 
model can be mimicked in this case (see, for example. Section 2.1.1 and Proposition C.9 in [Lutk 05]) 
in order to show that the recursion 


x(t) - xo = A(xo,0)(x(t - 1) - xo) -f e(t). 


( 4 . 1 ) 
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Empirical Ikeda TDR performance 
Task: instantaneous volatily filtering 


Theoretical estimation based on the reservoir model 
Task: instantaneous volatily filtering 



•1 




Figure 1: Error surfaces associated to the filtering of the instantaneous volatility C7(f) an ARSV data generating process. 

The left panel has been computed using Monte Carlo simulations in order to empirically evaluate the filtering 
error of the Ikeda TDR as a function of rj in (3.8) and of the distance between neurons. The right panel was 
obtained by evaluating the formula (2.13) based on the reservoir model (2.16) with a nonlinearity of order R = S. 
The two surfaces have minima at virtually the same parameter values. 


has a unique second order stationary solution given by 

OO 

x(t) =Xo + ^A(xo,0)*e(t-i). (4.2) 

i=0 

Taking expectations in both sides of (4.2) and using the stationarity of e{t) yields (2.19) (see also 
Proposition C.IO in [Lutk 05]). It is straightforward to verify using that expression that (4.1) is identical 
to (2.21) whose unique stationary solution is given by (2.22) and which hence coincides necessarily with 
(4.2). Finally, using the definition of F and (2.22), the expression (2.20) follows. ■ 

4.2 Proof of Theorem 2.9 

Proof of part (i) We start by modifying the notation introduced in (2.15) in order to specifically 
indicate the dependence of e(t) on the input signal and on the input mask. We set: 

R 2 

c) := V Oat, 0) (c (g) • • • (g) c). 

^ ^ z! ^_✓ 


i factors 


(4.3) 
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Note now that the map e{-, c) : K —that assigns e{z, C) to the input signal 2 is the composition 
of cz and the map Fp-{-,Xo, 6) : —>• in the statement. Consequently, if FP{-, xq, 0) satisfies 

any of the two injectivity hypotheses in (a) or (b) in the statement, then the map s{-,c) is necessarily 
injective, that is, 

e{z,c) ^ e{z',c), whenever z ^ z'. (4.4) 

Let now {z{t)}t^z, {z'(t)}tez be two input signals that are identical except at s G N, that is, 
z(s) ^ z'{s). We show by induction that the corresponding reservoir outputs x(t), x'(<) are such that 
x(t) ^ x'(t) for any t > s. First, since z{t) = z'{t) for any t < s, then x(t) = x'(t) for any t < s 
necessarily. Now by (2.16) and using the notation introduced in (4.3), we have: 

x(s) - x'(s) = A(xo, 9){x{s - 1) - x'(s - 1)) + e(z(s), C) - e{z'{s), C) = e{z{s),C) - e{z'{s), C). 

Given that z(s) yf - 2 '(s), the hypotheses (a) or (b) in the statement of the theorem together with (4.4) 
imply that x(s) x'(s). We now establish the induction step by assuming that x(t) y^ x'(t) for some 
t> s and we show that x(t + 1) y^ x'(t + 1). Indeed, in this case 

x(t + 1) - x'(t + 1) = A(xo, 0)(x{t) - x'(t)). (4.5) 

Since by hypothesis zero is not an eigenvalue of A(xo, 0), we have that x(t + 1) y^ x'(t + 1) necessarily 
because x(t) — x'(t) y^ 0 by the induction hypothesis. 

Proof of part (ii) Let ei > 0 and h G N. The continuity of the map e(-, c) defined in (4.3) implies that 
there exists (5(ei) > 0 such that if two input signals {z(t)}tez, {z'(t)}tez are such that |z(s) — 2 :'(s)| < 
i5(ei) for all s G [t — h,t], then ||£(z(s), c) — e{z'{s), c)|| < £1, for all s G [t — h, t]. Additionally, since by 
hypotheses the input signals are bounded, then ||e( 2 ;(t), c)|| < ATmax and ||e(z'(t), c)|| < ATmax, for all 
t G Z and for some ATmax G M. 

Let now x{t) and x'(t) be the reservoir outputs corresponding to z{t) and z'{t), respectively, then 
x(t) - x'(t) = A(Xo, 9){x{t -l)-x'{t- 1)) + £{z{t), c) - £{z'{t),c). 


Since by hypothesis ||A(xo,0)|| < 1 and the spectral radius p{A{xo,9) satisfies that p{A{xo,9) < 
||A(xo,0)|| for any matrix norm, we have that p{A{xq,9) < 1. Additionally, since the input signals are 
strictly stationary with finite automoments up to order 2R, we can use Proposition 2.6 to rewrite this 
expression as 

00 

x(t) - x'{t) = A(xo, 9y{£{z{t - i), c) - £{z'{t - i), c)) (4.6) 

which implies that 


c(t) - x'(<)|| P(xo,0)|r||£(z(f- f),c) - £{z'{t- i),c)\\ + 2A:max Y ll^(^0’^)ir‘ 




i=h-\-l 


WlMlx flilb I £1 + ( 2 A:max - ei)||A(xo,e)||''+^ 


i=0 


Finally, the hypothesis ||A(xo,0)|| < 1 implies that limft_,.oo ||A(xo, 0)||^'''^ = 0 and hence since £1 > 0 
can be made as small as desired, so is £, which proves the statement. I 
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4.3 Proof of Corollary 2.11 

Proof of part (i) In order to obtain this result as a corollary of part (i) in Theorem 2.9 we need 
to prove the following two statements. First, that the injectivity hypotheses on fp{-,xo,0) imply 
those about the corresponding map F/^(-,Xo,6) in Theorem 2.9. Second, in this case the linear map 
4(xo, 6) := DxF(jx.q,Oi^, 6) : -p has no zero eigenvalues. 

The first statement can be proved by noting that, as a consequence of (2.7): 


/ 


Ff(I,Xo,0) = (l-e-«) 


e~^fPiIi,xo,9) + fp{l2,xo,0) 
(^~‘^^fp{h,xo,9) + e~^f^{h,xo,e) + fp{l3,xo,9) 


\ 


\ e ^'>^fP{Ii,xo,9) + e fR(^i^^xo,9) + -h/f (Ijv, a;o, 0) / 

(4.7) 

This expression can be used to to easily show recursively that F/^(-,xo, 0) is injective if fP{-,xo,9) is 
injective. Concerning the second point. Theorem D.ll in [Grig 15b] proves that zero is not an eigenvalue 
of 4(xo,0). 


Proof of part (ii) It follows from part (ii) in Theorem 2.9 by noting that \dxf{xo,0,9)\ < 1 implies 
(see the proof of Theorem D.IO in Supplementary Material of [Grig 15b]) that 


||4(xo,0)||oo<l. ■ 
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